Skip to content

Mark failed one-hop HTLCs as retrably #1702

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

TheBlueMatt
Copy link
Collaborator

When our counterparty is the payment destination and we receive
an HTLCFailReason::Reason in fail_htlc_backwards_internal we
currently always set rejected_by_dest in the PaymentPathFailed
event, implying the HTLC should not be retried.

There are a number of cases where we use HTLCFailReason::Reason,
but most should reasonably be treated as retryable even if our
counterparty was the destination (i.e. !rejected_by_dest):

  • If an HTLC times out on-chain, this doesn't imply that the
    payment is no longer retryable, though the peer may well be
    offline so retrying may not be very useful,
  • If a commitment transaction "containing" a dust HTLC is
    confirmed on-chain, this definitely does not imply the payment
    is no longer retryable
  • If the channel we intended to relay over was closed (or
    force-closed) we should retry over another path,
  • If the channel we intended to relay over did not have enough
    capacity we should retry over another path,
  • If we received a update_fail_malformed_htlc message from our
    peer, we likely should not retry, however this should be
    exceedingly rare, and appears to nearly never appear in practice

Thus, this commit simply disables the behavior here, opting to
treat all HTLCFailReason::Reason errors as retryable.

Note that prior to 93e645d this
change would not have made sense as it would have resulted in us
retrying the payment over the same channel in some cases, however
we now "blame" our own channel and will avoid it when routing for
the same payment.

While we're at it, we also take this opportunity to rename rejected_by_dest to payment_failed_permanently, which is much clearer.

When our counterparty is the payment destination and we receive
an `HTLCFailReason::Reason` in `fail_htlc_backwards_internal` we
currently always set `rejected_by_dest` in the `PaymentPathFailed`
event, implying the HTLC should *not* be retried.

There are a number of cases where we use `HTLCFailReason::Reason`,
but most should reasonably be treated as retryable even if our
counterparty was the destination (i.e. `!rejected_by_dest`):
 * If an HTLC times out on-chain, this doesn't imply that the
   payment is no longer retryable, though the peer may well be
   offline so retrying may not be very useful,
 * If a commitment transaction "containing" a dust HTLC is
   confirmed on-chain, this definitely does not imply the payment
   is no longer retryable
 * If the channel we intended to relay over was closed (or
   force-closed) we should retry over another path,
 * If the channel we intended to relay over did not have enough
   capacity we should retry over another path,
 * If we received a update_fail_malformed_htlc message from our
   peer, we likely should *not* retry, however this should be
   exceedingly rare, and appears to nearly never appear in practice

Thus, this commit simply disables the behavior here, opting to
treat all `HTLCFailReason::Reason` errors as retryable.

Note that prior to 93e645d this
change would not have made sense as it would have resulted in us
retrying the payment over the same channel in some cases, however
we now "blame" our own channel and will avoid it when routing for
the same payment.
The `rejected_by_dest` field of the `PaymentPathFailed` event has
always been a bit of a misnomer, as its really more about retry
than where a payment failed. Now is as good a time as any to
rename it.
@codecov-commenter
Copy link

codecov-commenter commented Sep 7, 2022

Codecov Report

Merging #1702 (c57bb42) into main (e45db2b) will increase coverage by 0.01%.
The diff coverage is 94.91%.

@@            Coverage Diff             @@
##             main    #1702      +/-   ##
==========================================
+ Coverage   90.91%   90.93%   +0.01%     
==========================================
  Files          86       86              
  Lines       46417    46840     +423     
  Branches    46417    46840     +423     
==========================================
+ Hits        42201    42594     +393     
- Misses       4216     4246      +30     
Impacted Files Coverage Δ
lightning/src/routing/gossip.rs 91.43% <ø> (ø)
lightning/src/util/events.rs 39.78% <25.00%> (+0.27%) ⬆️
lightning-invoice/src/payment.rs 90.86% <100.00%> (ø)
lightning/src/ln/chanmon_update_fail_tests.rs 97.72% <100.00%> (ø)
lightning/src/ln/channelmanager.rs 86.05% <100.00%> (+1.07%) ⬆️
lightning/src/ln/functional_test_utils.rs 93.58% <100.00%> (ø)
lightning/src/ln/functional_tests.rs 96.91% <100.00%> (-0.17%) ⬇️
lightning/src/ln/monitor_tests.rs 99.55% <100.00%> (ø)
lightning/src/ln/onion_route_tests.rs 97.68% <100.00%> (ø)
lightning/src/ln/payment_tests.rs 98.89% <100.00%> (ø)
... and 5 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you're already renaming these variables and changing the semantics, maybe you could also rename/change payment_retryable, since the variable now doesn't paint the whole picture anyways? This also might make things a bit straight forward, since currently payments_retryable == !is_final_node and payment_failed_permanently == !payments_retryable. Maybe it would therefore be easier to avoid one step of negation and just return is_final_node a.k.a. rejected_by_dest from onion_utils::process_onion_failure()?

@TheBlueMatt
Copy link
Collaborator Author

I'm a little confused - payment_retryable (in fail_htlc_backwards_internal) does paint the full picture - the payment is either failed_permanently or its retryable, and they're negated. I agree we could invert the return value from process_onion_failure but of course its not that simple, its constructed in a bunch of places and I'm not a huge fan of touching that code in an otherwise-unrelated PR.

@tnull
Copy link
Contributor

tnull commented Sep 8, 2022

Alright, just found the double negation of essentially the same value in different places a bit harder to follow than it needed to be. But no strong opinion on it.

@TheBlueMatt
Copy link
Collaborator Author

Yea, I agree, but we can revisit it when we revisit the onion decode stuff separately, maybe in #1703.

@TheBlueMatt TheBlueMatt merged commit 5e65c49 into lightningdevkit:main Sep 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants